An Adaptive LDA Optimal Topic Number Selection Method in News Topic Identification
نویسندگان
چکیده
Nowadays, news text information is exploding, and people need more heterogeneous content. Therefore, topic identification needed to help viewers quickly accurately screen filter related their interests save time energy. The Latent Dirichlet Allocation(LDA) model the most commonly used method for identification. optimal number of topics must be specified in advance when using LDA extract previous studies. However, selection too-large or too-small significantly impacts final results models, which directly determines quality extraction. Moreover, datasets from social media are very time-sensitive, combination temporal semantic modelling has not been considered past studies This paper proposes an adaptive determination fusion address existing problems. Semantic first extracted this as two different views. Then, density peak clustering multi-view performed based on obtained feature vectors. topics. To demonstrate effectiveness proposed method, compares performance four traditional methods determining with paper’s public datasets. show that considering factors better than other regarding F-value, PMI scores, MI scores. It performs well indicators well. above experimental combines data determine text, can improve accuracy selecting some extent. understand utilize massive information. In addition, also broadens idea identifying mining unique multiple perspectives.
منابع مشابه
News Selection with Topic Modeling
There are numerous news articles coming to news aggregators and important news are selected to be presented on the front-page. There are two types of news selection for the front-page of news aggregators: personalized and public news recommendation (selection). This study examines public news recommendation that aims to satisfy all users’ interest on the front-page. Public news recommendation i...
متن کاملDynamic Threshold Selection Method for Multi-label Newspaper Topic Identification
Nowadays, the multi-label classification is increasingly required in modern categorization systems. It is especially essential in the task of newspaper article topics identification. This paper presents a method based on general topic model normalisation for finding a threshold defining the boundary between the “correct” and the “incorrect” topics of a newspaper article. The proposed method is ...
متن کاملTopic detection in broadcast news
We propose a system for the Topic Detection and Tracking (TDT) detection task concerned with the unsupervised grouping of news stories according to topic. We use an incremental k-means algorithm for clustering stories. For comparing stories, we utilize a probabilistic document similarity metric and a traditional vector-space metric. We note that that the clustering algorithm requires two differ...
متن کاملTopic extraction with multiple topic-words in broadcast-news speech
This paper reports on topic extraction in Japanese broadcastnews speech. We studied, using continuous speech recognition, the extraction of several topic-words from broadcast-news. A combination of multiple topic-words represents the content of the news. This is a more detailed and more flexible approach than using a single word or a single category. A topic-extraction model shows the degree of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3308520